Hardware and Software Fault Tolerance: Definition and Evaluation of Adaptive Architectures in a Distributed Computing Environment
نویسندگان
چکیده
This paper discusses the issue of providing tolerance to both hardware and software faults by defining several hybrid-fault-tolerant architectures, which can co-exist and work simultaneously at the top of the supporting environment, and introduces a systematic method for evaluating their dependability, efficiency and response time. To address general-purpose distributed systems where multiple unrelated applications may compete for system resources, our architectural solutions have an important concern with adaptation in the use of redundancy according to system conditions.
منابع مشابه
Hardware and Software Fault Tolerance: Adaptive Architectures in Distributed Computing Environments
This paper discusses the issue of providing tolerance to hardware and software faults in distributed computing environments as well as issues related to efficiency and flexibility. A set of new fault-tolerant architectures is presented, and a detailed dependability analysis of these architectures is performed together with an efficiency and response time evaluation. The proposed architectural s...
متن کاملAn adaptive approach to achieving hardware and software fault tolerance in a distributed computing environment
This paper focuses on the problem of providing tolerance to both hardware and software faults in independent applications running on a distributed computing environment. Several hybrid-fault-tolerant architectures are identified and proposed. Given the highly varying and dynamic characteristics of the operating environment, solutions are developed mainly exploiting the adaptation property. They...
متن کاملAdaptive Architectures for Hybrid Fault Tolerance in Distributed Computing Systems
This paper discusses the issue of hardware and software fault tolerance in distributed computing environments as well as issues related to efficiency and flexibility. A set of new fault-tolerant architectures is presented, and a detailed dependability analysis of these architectures together with an efficiency evaluation is performed. The proposed architectural solutions are based on the assump...
متن کاملImproving the palbimm scheduling algorithm for fault tolerance in cloud computing
Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...
متن کاملReliability and Performance Evaluation of Fault-aware Routing Methods for Network-on-Chip Architectures (RESEARCH NOTE)
Nowadays, faults and failures are increasing especially in complex systems such as Network-on-Chip (NoC) based Systems-on-a-Chip due to the increasing susceptibility and decreasing feature sizes. On the other hand, fault-tolerant routing algorithms have an evident effect on tolerating permanent faults and improving the reliability of a Network-on-Chip based system. This paper presents reliabili...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000